Thesis work, 30 credits (2 students) - Generative Learning in Drug Discovery
AstraZeneca is seeking a Master’s thesis student to explore a central question in AI-driven medicinal chemistry: How much information do we truly need to transform an initial hit into a candidate drug? You will leverage deep learning–based generative models to accelerate the transition from weak hits to robust candidate drugs, quantifying the informational content required along the way. The work builds on prior graduate student foundations and uses AstraZeneca’s proprietary generative platform, ReInvent, alongside public benchmarking datasets.
About AstraZeneca:
AstraZeneca is a global, science-led, patient-centered biopharmaceutical company focusing on discovering, developing, and commercializing prescription medicines for some of the world’s most serious diseases. But we’re more than a global leading pharmaceutical company. At AstraZeneca, we're dedicated to being a Great Place to Work and empowering employees to push the boundaries of science and fuel their entrepreneurial spirit.
About the Opportunity:
As a Thesis Worker at AstraZeneca, you’ll find an environment that’s full of unique opportunities and exciting challenges. Here, you’ll have the opportunity to pursue your areas of interest whilst equally developing a broad skillset and knowledge base to get the best out of your experience. You’ll be working on meaningful projects to make an impact and deliver real value for our patients and our business.
Thesis work description:
Generative learning—rooted in the idea that new knowledge builds on prior knowledge—has rapidly advanced de novo drug design. Reinforcement learning, autoencoders, and RNNs can generate novel molecules (SMILES) directly from data, yet a systematic bridge from modest hit compounds to candidate drugs still lacks standardization. This thesis will:
Data curation:
- Internal datasets (established): Utilize retrospective AstraZeneca project data built on prior code and workflows.
- Public datasets (new): Extend and generalize the methodology to open datasets relevant to hit-to-lead and lead optimization, enabling transparent benchmarking and reproducibility.
Generative modeling:
- Work with AstraZeneca’s proprietary generative AI model, ReInvent, for hit-to-candidate trajectories.
- Scoring functions:
- Phase 1 (completed): Idealized in-silico scoring to model the optimal scenario.
- Phase 2 (current): Introduce realistic, experimentally derived criteria (e.g., lipophilicity, solubility, biological activity).
Learning strategies:
- Integrate active learning and reinforcement learning to refine structure–activity predictions and decision policies during optimization.
- Policy optimization
- Evaluate optimization algorithms in ReInvent to identify efficient oracle functions for candidate generation.
Publication:
- Prepare a manuscript detailing methodology, results, and implications, with the goal of submission to a peer‑reviewed journal during or by the end of the placement.
Key Objectives:
- Model benchmarking: Assess the efficiency and accuracy of ReInvent and policy optimization under varying information constraints, across proprietary and public datasets.
- Quantify informational content: Determine the minimum and optimal information needed to progress from hit to candidate drug.
- Impact assessment: Deliver actionable recommendations for generative AI use in drug discovery at AstraZeneca and to the wider community via publication.
Impact:
Your findings will directly inform ongoing and future projects using ReInvent, accelerating candidate design with improved speed and efficiency. Publication will extend impact to the broader scientific and pharmaceutical communities.
References:
Brown DG. An Analysis of Successful Hit-to-Clinical Candidate Pairs. J Med Chem (2023).
He J et al. Molecular optimization capturing chemist’s intuition using deep neural networks. J Cheminform (2021).
Gummesson Svensson H et al. Utilizing reinforcement learning for de novo drug design. Mach Learn (2024).
Loeffler HH et al. Reinvent 4: Modern AI–driven generative molecule design. J Cheminform (2024).
Mervin L et al. [Property-based scoring and optimization—details to be provided during onboarding.]
Structure:
- Duration: Spring term 2026
- Credits: 30
- Two Students (apply together)
Essential Requirements:
- Enrolled in a Master's program in a relevant field.
- Programming in python
- Knowledge in Machine learning
- AI knowledge
So, what’s next?
Apply today and take the chance to be part of making a difference, making connections, and gaining the tools and experience to open doors and fulfil your potential. We can´t wait to hear from you!
We welcome your application as soon as possible, but ahead of the scheduled closing date 2nd of November 2025. In the event that we identify suitable candidates ahead of the scheduled closing date, we reserve the right to withdraw the vacancy earlier than published.
Date Posted
21-okt.-2025Closing Date
02-nov.-2025Our mission is to build an inclusive and equitable environment. We want people to feel they belong at AstraZeneca and Alexion, starting with our recruitment process. We welcome and consider applications from all qualified candidates, regardless of characteristics. We offer reasonable adjustments/accommodations to help all candidates to perform at their best. If you have a need for any adjustments/accommodations, please complete the section in the application form.Join our Talent Network
Be the first to receive job updates and news from AstraZeneca
Sign up